A Survey of Genomic Traces Reveals a Common Sequencing

نویسندگان

  • Alexander Wait
  • Erez Y. Levanon
  • Tomer Zecharia
  • Tom Clegg
  • Alexander Wait Zaranek
  • George M. Church
چکیده

While it is widely held that an organism’s genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-toA). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets. Citation: Zaranek AW, Levanon EY, Zecharia T, Clegg T, Church GM (2010) A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing. PLoS Genet 6(5): e1000954. doi:10.1371/journal.pgen.1000954 Editor: Dirk Schübeler, Friedrich Miescher Institute for Biomedical Research, Switzerland Received August 20, 2009; Accepted April 15, 2010; Published May 20, 2010 Copyright: 2010 Zaranek et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: EYL was supported by the Machiah foundation. Funding came from National Human Genome Research Institute Centers of Excellence in Genomic Science grant to GMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (AWZ); [email protected] (EYL) . These authors contributed equally to this work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trac...

متن کامل

Whole Exome Sequencing Reveals a BSCL2 Mutation Causing Progressive Encephalopathy with Lipodystrophy (PELD) in an Iranian Pediatric Patient

Background: Progressive encephalopathy with or without lipodystrophy is a rare autosomal recessive childhood-onset seipin-associated neurodegenerative syndrome, leading to developmental regression of motor and cognitive skills. In this study, we introduce a patient with developmental regression and autism. The causative mutation was found by exome sequencing. Methods: The proband showed a gener...

متن کامل

I-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies

The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...

متن کامل

I-39: Exome Sequencing Reveals New Genes Involved in Human Infertility

Background - MaterialsAndMethods N;Results N;Conclusion N;

متن کامل

Whole Exome Sequencing Reveals a XPNPEP3 Novel Mutation Causing Nephronophthisis in a Pediatric Patient

Background: Nephronophthisis (NPHP) is a progressive tubulointestinal kidney condition that demonstrates an AR inheritance pattern. Up to now, more than 20 various genes have been detected for NPHP, with NPHP1 as the first one detected. X-prolyl aminopeptidase 3 (XPNPEP3) mutation is related to NPHP-like 1 nephropathy and late onset NPHP. Methods: The proband (index patient) had polyuria, polyd...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010